This notebook outlines my process of tree based and Neural Network models. This notebook is dependent on the data table gameInfo generated from DataExtraction.RMD.
Packages
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
-- Attaching packages --------------------------------------------------------------------------------- tidyverse 1.3.1 --
v ggplot2 3.3.5 v purrr 0.3.4
v tibble 3.1.2 v dplyr 1.0.7
v tidyr 1.1.3 v stringr 1.4.0
v readr 1.4.0 v forcats 0.5.1
-- Conflicts ------------------------------------------------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
Warning message:
In read_python_versions_from_registry("HCU", key = "PythonCore") :
Unexpected format for PythonCore version: 3.10
library(data.table)
data.table 1.14.0 using 4 threads (see ?getDTthreads). Latest news: r-datatable.com
Attaching package: ‘data.table’
The following objects are masked from ‘package:dplyr’:
between, first, last
The following object is masked from ‘package:purrr’:
transpose
library(randomForest)
Warning: package ‘randomForest’ was built under R version 4.1.2
randomForest 4.6-14
Type rfNews() to see new features/changes/bug fixes.
Attaching package: ‘randomForest’
The following object is masked from ‘package:dplyr’:
combine
The following object is masked from ‘package:ggplot2’:
margin
library(rpart.plot)
Warning: package ‘rpart.plot’ was built under R version 4.1.2
Loading required package: rpart
Warning: package ‘rpart’ was built under R version 4.1.2
library(word2vec)
Warning: package ‘word2vec’ was built under R version 4.1.2
library(Rtsne)
Warning: package ‘Rtsne’ was built under R version 4.1.2
library(plotly)
Warning: package ‘plotly’ was built under R version 4.1.2
Attaching package: ‘plotly’
The following object is masked from ‘package:ggplot2’:
last_plot
The following object is masked from ‘package:stats’:
filter
The following object is masked from ‘package:graphics’:
layout
library(keras)
Warning: package ‘keras’ was built under R version 4.1.2
library(tfruns)
Warning: package ‘tfruns’ was built under R version 4.1.2
library(rsample)
Warning: package ‘rsample’ was built under R version 4.1.2
Loading Data from other part
load("../data/league.RDATA")
Warning message:
In read_python_versions_from_registry("HCU", key = "PythonCore") :
Unexpected format for PythonCore version: 3.10
List to Store Results
data.tree <- list(
models = list(),
plots = list(),
temp.data = list()
)
championCluster <- list(
models = list(),
plots = list(),
temp.data = list()
)
Wrangling Data
So I want to make a basic tree classifier of projected winning team comps. For now, a basic model of simple champion tags will be used.
Setting up Training / Test Data
# Setting Seed for Reproducibility
set.seed(3)
data.tree$temp.data$sample <- sample(data.tree$temp.data$gameInfo.tree$match, nrow(data.tree$temp.data$gameInfo.tree)*.7)
data.tree$temp.data$train <- data.tree$temp.data$gameInfo.tree %>%
filter(match %in% data.tree$temp.data$sample)
data.tree$temp.data$test <- data.tree$temp.data$gameInfo.tree %>%
filter(!match %in% data.tree$temp.data$sample)
Generating Random Forest
set.seed(3)
data.tree$models$teamComp_forest <- randomForest(
team_win ~ . - match,
data = data.tree$temp.data$train,
ntree = 500,
importance = TRUE,
na.action = na.omit
)
data.tree$models$teamComp_forest
Call:
randomForest(formula = team_win ~ . - match, data = data.tree$temp.data$train, ntree = 500, importance = TRUE, na.action = na.omit)
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 3
OOB estimate of error rate: 50.08%
Confusion matrix:
1 2 class.error
1 11396 12530 0.5236981
2 11377 12438 0.4777241
importance(data.tree$models$teamComp_forest)
1 2 MeanDecreaseAccuracy MeanDecreaseGini
Assassin_1 9.9540136 -6.814106 5.1561345 332.2679
Fighter_1 17.2205799 -13.858278 5.5311721 346.1510
Marksman_1 9.5225581 -8.116976 1.7123134 293.4874
Tank_1 7.4799838 -4.724412 3.9559451 285.2230
Mage_1 16.9019553 -14.924233 2.9464378 284.1115
Support_1 12.2424356 -11.952428 1.8623555 242.4686
Assassin_2 0.3371864 -2.088303 -2.3073435 359.3653
Fighter_2 2.4827423 -4.897604 -2.7091299 425.1212
Marksman_2 3.2506393 -4.942703 -1.7623174 369.9343
Tank_2 3.0022087 -5.144555 -2.4517398 338.7178
Mage_2 1.0833946 -1.633843 -0.6288478 389.8527
Support_2 0.6855808 -5.485852 -5.9403592 288.7111
varImpPlot(data.tree$models$teamComp_forest)

Let’s compare to a simple blue side always wins classifier:
data.tree$temp.data$gameInfo.tree %>%
count(team_win) %>%
mutate(n = n/sum(n))
Well, it’s slightly better than the naive blue side win classifier but clearly the number of champions with tags isn’t a very strong predictor of team success. With the current coding, I’m fairly certain that there won’t really be a robust classifier.
Let’s try to identify clusters of champion types. # Generating Input Team Sentences
Generating Model
Pretty clearly 5 main clusters of champions each corresponding to a role. Doesn’t really help too much in determining team compositions. I could set up a KNN to verify this but it seems pretty clear cut to me.
Neural Network
Wrangle Data
data.NN <- list()
data.NN$data.temp <- championCluster$temp.data$teams %>%
select(!match)
data.NN$data.temp
Running Model - See TeamCompNN.R
Hyperparameter Tuning
runs <- tuning_run(
"TeamCompNN.R",
flags = list(
dropout = c(0.2, 0.3, 0.4, 0.5),
unit = c(8, 16, 64)
)
)
runs %>%
arrange(desc(metric_val_accuracy))
# So a dropout of .3 and 8 unit dense network seems to produce the best validation error
results
loss accuracy
0.6919282 0.5214252
Around 52% accuracy, not the best, but not bad considering the variance of league of legends.
Saving Model
save_model_tf(model, "initialNN.tf")
2021-12-21 18:54:50.959358: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
Evaluating Example Team
model %>% predict("Sett Trundle Kindred Ziggs Leona")
[,1]
[1,] 0.5169869
A very weird way to code a team comp predictor - I’ll try a different method in Part 3.
---
title: "Trees and Support Vector Machines"
output: html_notebook
---

This notebook outlines my process of tree based and Neural Network models. This notebook is dependent on the data table gameInfo generated from DataExtraction.RMD.

# Packages
```{r}
library(tidyverse)
library(data.table)
library(randomForest)
library(rpart.plot)
library(word2vec)
library(Rtsne)
library(plotly)
library(keras)
library(tfruns)
library(rsample)
```

# Loading Data from other part
```{r}
load("../data/league.RDATA")
```

# List to Store Results
```{r}
data.tree <- list(
  models = list(),
  plots = list(),
  temp.data = list()
)
championCluster <- list(
  models = list(),
  plots = list(),
  temp.data = list()
)
```


# Wrangling Data
So I want to make a basic tree classifier of projected winning team comps. For now, a basic model of simple champion tags will be used.
```{r}
data.tree$temp.data$gameInfo.temp <- gameInfo %>% 
  left_join(
    champions.scraped,
    by = c("championName" = "name")
  ) %>% 
  group_by(match) %>% 
  mutate(
    team = rleid(win)
  ) %>% 
  ungroup()

data.tree$temp.data$gameInfo.tags <- data.tree$temp.data$gameInfo.temp %>% 
  group_by(match, team) %>% 
  count(tag) %>% 
  ungroup() %>% 
  pivot_wider(
    names_from = tag,
    values_from = n
  ) %>% 
  pivot_wider() %>% 
  replace(is.na(.), 0) 


data.tree$temp.data$gameInfo.tree <- data.tree$temp.data$gameInfo.temp %>% 
  filter(win == TRUE) %>% 
  select(match, team_win = team) %>% 
  distinct(match, .keep_all = T) %>% 
  mutate(
    team_win = factor(team_win, levels = c(1, 2))
  ) %>% 
  left_join(
    data.tree$temp.data$gameInfo.tags %>% 
      filter(team == 1) %>% 
      rename_with(
        .fn = function(x){
          
          paste0(x, "_1") %>% 
            return()
          
        },
        .cols = 3:8
      ) %>% 
      select(!team),
    by = "match"
  ) %>% 
  left_join(
    data.tree$temp.data$gameInfo.tags %>% 
      filter(team == 2) %>% 
      rename_with(
        .fn = function(x){
          
          paste0(x, "_2") %>% 
            return()
          
        },
        .cols = 3:8
      ) %>% 
      select(!team),
    by = "match"
  ) %>% 
  mutate_if(is.integer, as.factor)

data.tree$temp.data$gameInfo.tree
```

# Setting up Training / Test Data
```{r}
# Setting Seed for Reproducibility
set.seed(3)
# Next time use rsample 
data.tree$temp.data$sample <- sample(data.tree$temp.data$gameInfo.tree$match, nrow(data.tree$temp.data$gameInfo.tree)*.7)
data.tree$temp.data$train <- data.tree$temp.data$gameInfo.tree %>% 
  filter(match %in% data.tree$temp.data$sample)
data.tree$temp.data$test <- data.tree$temp.data$gameInfo.tree %>% 
  filter(!match %in% data.tree$temp.data$sample)
```

# Generating Random Forest
```{r}
set.seed(3)
data.tree$models$teamComp_forest <- randomForest(
  team_win ~ . - match,
  data = data.tree$temp.data$train,
  ntree = 500,
  importance = TRUE,
  na.action = na.omit
)

data.tree$models$teamComp_forest
```
```{r}
importance(data.tree$models$teamComp_forest)
varImpPlot(data.tree$models$teamComp_forest)
```
Let's compare to a simple blue side always wins classifier:
```{r}
data.tree$temp.data$gameInfo.tree %>% 
  count(team_win) %>% 
  mutate(n = n/sum(n))
```
Well, it's slightly better than the naive blue side win classifier but clearly the number of champions with tags isn't a very strong predictor of team success. With the current coding, I'm fairly certain that there won't really be a robust classifier.

Let's try to identify clusters of champion types.
# Generating Input Team Sentences 
```{r}
championCluster$temp.data$teams <- gameInfo %>% 
  select(match, win, championName) %>% 
  group_by(match, win) %>% 
  mutate(championNumber = row_number()) %>% 
  pivot_wider(
    names_from = championNumber,
    values_from = championName
  ) %>% 
  transmute(match = match, win = win, team = str_c(`1`,`2`,`3`,`4`,`5`, sep = " ")) %>% 
  ungroup() 

championCluster$temp.data$teams
write_csv(championCluster$temp.data$teams, "../data/teamNames.csv")
```
# Generating Model
```{r}
set.seed(3)
championCluster$models$nlpModel <- word2vec(
  x = championCluster$temp.data$teams$team, 
  type = "skip-gram", 
  dim = 20, 
  iter = 15
)

# Embedding Matrix
championCluster$models$embeddingMatrix <- as.matrix(championCluster$models$nlpModel)

# Applying TSne 
championCluster$models$Tsne <- Rtsne(championCluster$models$embeddingMatrix, pca = FALSE)

championCluster$plots$map <- championCluster$models$Tsne$Y %>% 
  as.data.frame() %>%
  mutate(champion = row.names(championCluster$models$embeddingMatrix)) %>%
  ggplot(aes(x = V1, y = V2, label = champion)) + 
  geom_point() 

championCluster$plots$map <- championCluster$plots$map %>% 
  ggplotly()

championCluster$plots$map 
```
Pretty clearly 5 main clusters of champions each corresponding to a role. Doesn't really help too much in determining team compositions. I could set up a KNN to verify this but it seems pretty clear cut to me.

# Neural Network
## Wrangle Data
```{r}
data.NN <- list()
data.NN$data.temp <- championCluster$temp.data$teams %>% 
  select(!match)
  
data.NN$data.temp
```

# Running Model - See TeamCompNN.R
## Hyperparameter Tuning
```{r}
runs <- tuning_run(
  "TeamCompNN.R",
  flags = list(
    dropout = c(0.2, 0.3, 0.4, 0.5),
    unit = c(8, 16, 64)
  )
)

runs %>% 
  arrange(desc(metric_val_accuracy))
# So a dropout of .3 and 8 unit dense network seems to produce the best validation error
```

```{r}
results
```
Around 52% accuracy, not the best, but not bad considering the variance of league of legends.

# Saving Model
```{r}
save_model_tf(model, "initialNN.tf")
```
```{r include = F}
model <- load_model_tf("./initialNN.tf")
```


# Evaluating Example Team
```{r}
model %>% predict("Sett Trundle Kindred Ziggs Leona")
```
A very weird way to code a team comp predictor - I'll try a different method in Part 3.
